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Abstract 

A 30-node signed and directed network responsible for self-renewal and pluripotency of mouse embryonic stem cells 
(mESCs) was extracted from several ChlP-Seq and knockdown followed by expression prior studies. The underlying 
regulatory logic among network components was then learned using the initial network topology and single cell gene 
expression measurements from mESCs cultured in serum/LIF or serum-free 2i/LIF conditions. Comparing the learned 
network regulatory logic derived from cells cultured in serum/LIF vs. 2i/LIF revealed differential roles for Nanog, Oct4/ 
Pou5f1, Sox2, Esrrb and Tcf3. Overall, gene expression in the serum/LIF condition was more variable than in the 2i/LIF but 
mostly consistent across the two conditions. Expression levels for most genes in single cells were bimodal across the entire 
population and this motivated a Boolean modeling approach. In silico predictions derived from removal of nodes from the 
Boolean dynamical model were validated with experimental single and combinatorial RNA interference (RNAi) knockdowns 
of selected network components. Quantitative post-RNAi expression level measurements of remaining network 
components showed good agreement with the in silico predictions. Computational removal of nodes from the Boolean 
network model was also used to predict lineage specification outcomes. In summary, data integration, modeling, and 
targeted experiments were used to improve our understanding of the regulatory topology that controls mESC fate 
decisions as well as to develop robust directed lineage specification protocols. 
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Introduction 

mESCs are derived from the inner cell mass of a developing 
blastocyst and can be propagated indefinitely in culture. Cultured 
mESCs can contribute to all adult cell populations, including the 
germ-line. Human ESCs have similar in-vitro differentiation 
potential. It is now established that somatic cells can be 
reprogrammed into induced pluripotent stem cells (iPSCs) using 
simple combinations of transcription factors (TFs) [1-3] or other 
methods. Mouse and human iPSCs closely resemble ESCs, 
potentially removing ethical and tissue rejection barriers to 
applications in regenerative medicine. In order to harness the full 
potential of stem cell therapeutics there is a pressing need to 
further characterize the regulatory topology that controls pluripo- 
tency as well as commitment and differentiation to specific 
lineages. Pluripotency is maintained by a densely interconnected 
network of auto- and cross-regulatory TFs and other transcription 
regulators. These TFs and regulators promote the expression of 
other pluripotency genes and simultaneously suppress the expres- 
sion of differentiation inducers [4] . To dissect the ESC regulatory 



topology, genome-wide high-throughput technologies such as 
cDNA microarrays, RNA-seq, ChlP-seq, immuno-precipitation 
followed by mass spectrometry (IP-MS) proteomics and phospho- 
proteomics, inhibitory RNA (RNAi) screens, as well as other 
emerging technologies have been applied. However, it remains a 
challenge to integrate multiple datasets, obtained from distinct 
sources and molecular regulatory layers into a systems level view of 
ESC regulation. Such data integration is necessary in order to 
build reliable predictive regulatory models that would provide a 
global view of the entire system. While static network diagrams 
can provide snapshot views of the information processing that 
controls cell fate decisions, it is necessary to develop regulatory 
models that capture the dynamical behavior of key regulatory 
components over time. 

In recent years, several stem cell-centered dynamical models 
have been developed from low- throughput functional studies. 
Most models employed ordinary differential equations (ODEs) and 
simulate interactions among a small number of well-studied TFs 
[5—7]. For example, a stochastic ODE model that linked Nanog, 
Oct4/Pou5fl and Sox2 to an osteoblast differentiation circuit 
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Author Summary 

For this study we first constructed a directed and signed 
network consisting of 15 pluripotency regulators and 15 
lineage commitment markers that extensively interact to 
regulate mouse embryonic stem cells fate decisions from 
data available in the public domain. Given the connectivity 
structure of this network, the underlying regulatory logic 
was learned using single cell gene expression measure- 
ments of mESCs cultured in two different conditions. With 
connectivity and logic learned, the network was then 
simulated using a dynamic Boolean logic framework. Such 
simulations enabled prediction of knockdown effects on 
the overall activity of the network. Such predictions were 
validated by single and combinatorial RNA interference 
experiments followed by expression measurements. Final- 
ly, lineage specification outcomes upon single and 
combinatorial gene knockdowns were predicted for all 
possible knockdown combinations. 

comprised of three additional TFs showed that cells can jump from 
one state to another if enough noise is added to the system [8] . In 
general, predictions made from computational dynamical models 
of embryonic stem cell have not been extensively experimentally 
validated until very recently. In the past year, few other 
comprehensive studies that integrated various datasets and 
constructed larger models of the ESC regulatory circuitry have 
emerged [9-12]. For example, Dowell et al. [11] integrated gene 
expression, ChlP-seq, protein interactions, RNAi screens and 
epigenetics markers to build a Bayesian network model of 
pluripotency genes. Their main focus was comparing human 
and mouse ESCs and their networks models are static. One of the 
advantages of their approach is that the network was not 
determined a priori which allowed the discovery of novel self- 
renewal and pluripotency components. Dowell et al. also present a 
database that is similar to our ESCAPE database [13] called 
StemSite. Lee and Zhou [10] combined ChlP-seq, gene expression 
and motif finding data to identify pairs of transcription factors that 
potentially work together within the pluripotency circuitry. They 
established 27 interactions between 14 factors. Many of the 
interactions they identified are consistent with our study. In 
another similar study Dunn et al. [9] implemented a data 
constrained Boolean model that connected 1 2 transcription factors 
through 16 interactions to suggest the minimal possible circuitry 
required to maintain pluripotency of mESC. In contrast with these 
studies, our study has primary gene expression data from single 
cell mESC, we also consider lineage markers and predict lineage 
propensity after perturbations, and provide extensive experimental 
validation with double and triple knockdowns followed by gene 
expression profiling of both pluripotency regulators and lineage 
markers. 

In order to address the need for multi-level data integration and 
broader experimentally validated dynamical models, we first 
extracted a signed and directed network from published ChlP- 
seq and knockdown followed by expression studies. All included 1 5 
pluripotency network nodes and their interactions are supported 
by ChlP-seq data providing TF/target-gene binding evidence, as 
well as significant mRNA expression change following depletion or 
over-expression of a given TF. The ChlP-seq TF/target-gene 
binding evidence provides the directionality of the edges; whereas 
the knockdown or over-expression followed by genome-wide 
expression evidence establishes positive or negative regulatory 
edge sign. The underlying network regulatory logic was then 
learned using single-cell gene expression data collected with a 



microfluidic device. The learned model was validated by 
comparing predictions from in silico single or combinatorial node 
knockdowns to experimental single or combinatorial RNAi 
knockdowns of selected nodes followed by quantitative PCR 
expression level measurements of the remaining network compo- 
nents. Finally, lineage specification outcomes of single and 
combinatorial perturbations were predicted for all possible 
knockdown combinations. 

Results/Discussion 

Construction of a signed and directed network 
composed of pluripotency regulators and lineage 
markers 

We first extracted a signed and directed pluripotency network of 
mESCs consisting of 15 transcription regulators and 15 lineage 
markers from the ESCAPE database (Figure 1) [14]. The 
ESCAPE database contains TF/ target-gene interactions extracted 
from a collection of ChlP-seq studies applied to mESCs. In 
addition, the ESCAPE database also contains regulatory causal 
interactions connecting perturbed TFs to affected target genes 
based on mRNA expression. Such interactions were extracted 
from the loss-of-function (LOF) knockdown /knockout of TFs, or 
gain-of-function (GOF) over-expression followed by global tran- 
scriptional profiling measured by microarrays or RNA-seq 
database tables in ESCAPE. The 15 pluripotency regulators were 
selected only if both ChlP-seq and LOF/GOF evidence was 
available; whereas the selection of the 15 lineage markers was 
determined by expert curation. The network nodes represent 15 of 
the mostly well-studied pluripotency TFs and 15 of the most 
established early differentiation lineage markers. The selected 
factors also have phenotypic evidence that is important for 
sustaining normal mESC functions based on cell phenotype 
profiling after knockdown or over-expression. In addition most of 
the 15 selected pluripotency factors are well studied where some 
also were shown to play central role in iPSC reprogramming. The 
15 lineage markers were selected to be the most established 
markers with having a relatively even representation for each 
lineage. The list of studies and the criteria used for inclusion of 
network nodes are described in Table SI. The initial network 
created from the high-content studies had few links with 
conflicting signs. These conflicting signed links were resolved by 
citing specific publications from the literature (Table S2). The final 
network contains 30 nodes, 106 links, 10 positive auto-regulatory 
feedback loops, as well as 26 positive and 13 negative other 
feedback loops (Figure SI). 

Measuring gene expression in single cells 

The 30-node pluripotency network described above can suggest 
some novel regulatory mechanisms governing mESCs regulation. 
However, the directed and signed graph representation still lacks 
crucial regulatory information. Specifically, the transition func- 
tions (logic gates) that tie the activity of each node to the activities 
of its upstream regulators are not specified. In other words, the 
representation does not provide information on how the combi- 
natorial state of upstream parent nodes determines the expression 
and activity of a given downstream node. The coding of 
transcriptional regulation to Boolean logic gates is a mathematical 
idealization and abstraction of the complex biochemical processes 
of transcriptional regulations. As such, a Boolean model can 
capture the essence of the regulatory relationships but may still 
lose important details. In addition, the network was created from 
data collected in different laboratories, under different experi- 
mental conditions and over different time scales. Moreover, all 
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Figure 1. Signed and directed network extracted from ChlP-seq and knockdown or over-expression followed by genome-wide 
expression. Edges are established where there is evidence for transcription factor binding to the gene proximal region and also change in 
expression after knockdown or over-expression. The 15 pluripotency nodes are color coded in light blue, and the lineage markers are color coded for 
the four major early differentiation lineages. Red diamond-heads denote inhibition and black arrowheads denote activation. Dashed lines connect 
pluripotency regulators to lineage markers and solid arrows connect pluripotency regulators to other pluripotency regulators. 
doi:10.1371/journal.pcbi.1003777.g001 



data were from bulk populations of mESGs and there is evidence 
for cell-to-cell heterogeneity in mESCs [15,16]. Hence, some links 
may be inaccurate, missing, or present in different contexts in 
individual cells, or in certain subpopulations. We therefore 
attempted to refine the network topology by learning the transition 
functions regulatory logic, and as a result enhancing the 
information about the network links based on gene expression 
measurements in single mESCs. 

Expression levels were measured using the Fluidigm micro- 
fluidic quantitative RT-PCR platform for single cell gene 
expression profiling [17]. In those experiments the expression of 
96 genes was measured in 96 individual cells, including the 30 
genes/ nodes in the network assembled from ESCAPE (Figures 2, 
3, and S2). The remaining 66 genes measured in single cells 
represent early differentiation markers and controls (Table S3). 
Measuring 96 individual genes in 96 single cells is a much more 
direct method to learn the regulatory logic of the pluripotency 
network. Analysis of data collected from a combined bulk of cells 
across a set of samples is less direct than single cell data because 
the regulatory relationships are masked by population averages. 



Two culture conditions were employed, +serum/LIF (serum/LIF) 
and -serum/+2i/LIF (2i/LIF). Serum can be a source of 
variability and with serum-free 2i media, two pharmacological 
inhibitors targeting the kinases MEK and GSK3(3 are sufficient to 
maintain pluripotency [18]. Both conditions benefit from the 
inclusion of LIF. However, it is still not completely clear how such 
variable conditions alter the connectivity of the core transcrip- 
tional network that maintains pluripotency. The various condi- 
tions when compared can give us clues about the required core 
circuitry that is needed to maintain pluripotency [19], as well as 
the subtle differences that are expected to exist between 
conditions, particularly at the single cell level. 

Heterogeneous gene expression could arise from stochastic 
fluctuations or from natural global fluctuations in the pluripotent 
state [20]. As expected, a comparison of gene expression 
heterogeneity using indicator dispersion indices revealed less 
heterogeneity in the 2i/LIF condition as compared to serum/ 
LIF (p<10~ 5 , one-tailed paired t-test, Figure S2). This can be seen 
in the generally narrower distributions for most mRNAs in the 2i/ 
LIF condition (Figure 2). Our results show that Nanog and Zfp42/ 
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Figure 2. 30 mRNA gene expression measurements in 96 single mESCs cell using the Fluidgm single cell microfluidic device. 

Histograms of gene expression Ct-values from RT-PCR data collected from single mESCs in -i-serum/LIF (serum/LIF, blue) and -serum/2i/LIF (2i/LIF, 
red). Ct-values were normalized using the housekeeping gene Gapdh. x-axis represents -ACt values and y-axis represents the percentage of total cells. 
Alpha blending is used to show the results from the other condition for clearer visual comparison. 
doi:10.1371/journal.pcbi.1003777.g002 



Rexl expression is more homogeneous in the 2i/LIF condition 
compared with the serum/LIF condition, consistent with a 
previous study [21]. More globally, with the exception of Tbx3, 
Fgf5 and Otx2 (highlighted in bold and blue in Figure S2), gene 
expression patterns are generally similar in the two conditions in 
terms of expression distributions (Figure 2). In agreement with 
previous observations, the 2i/LIF media stimulates the Wnt 
pathway, known to regulate the expression of Otx2 and Fgf5 
[22,23]. Also, certain lineage-specific genes, namely Fgfr2, Gata4, 
T, Gata6, Handl and Ncaml display overall lower expression 
levels in 2i/LIF (Table S5). To address the possibility that these 
observations may be unique to the type of mESCs we used in our 
experiments, we compared the Esrrb expression level distribution 
to data from another study [24] (Figure S2 and Text SI). In both 
cases, Esrrb mRNA levels are bi-modal and similarly distributed in 
single cell populations. In addition, we demonstrated previously 
that Esrrb protein levels are comparable in the mESCs we used 
and wild-type cells [25]. Since the distribution of Esrrb mRNA 
expression is similar in the Esrrb removed (Esrrb_R), CCE and 



Nanog removed (Nanog_R) mESCs using the same Fluidigm 
BioMark platform, we believe that Esrrb levels are well controlled 
in the Esrrb_R cell line and the results can be generalized to wild- 
type mESCs. 

Importantly, single cell expression profiles show bimodal 
distributions for numerous genes (Figures 2 and S2). Therefore, 
we reasoned that a Boolean framework for learning the underlying 
network regulatory logic by assigning values of 1 or 0 to high or 
low expression states, respectively, was a valid initial approach. 
Boolean modeling of gene-regulatory networks had first been 
proposed in the late 1960s [26]. Recently this approach regained 
popularity with the availability of more detailed systems level 
experimental data and networks [27]. In order to implement a 
Boolean representation, continuous Ct-value mRNA expression in 
single cells were converted to binary values (1 or 0) using a single 
K-means clustering step for all genes with K = 2 (Figure 3A-B and 
Figure S2). Delta Ct-values are used instead of the exponential of 
the delta Ct since this way it is more convenient for binarization. 
Hierarchical clustering of continuous (Figure 3C-D) and binarized 
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Figure 3. (A-B) Plots of normalized Ct-values partitioned into two clusters. (C-D) Hierarchical clustering of gene expression in single mESCs 
using normalized Ct-values from RT-PCR data measuring levels of 15 pluripotency genes (orange) and 15 lineage-specific genes (purple) in 96 cells 
cultured in serum/LIF (C) and in 2i/LIF (D). The x-axis represents individual cells and the y-axis represents genes. Yellow represents 'high' and blue 
represents 'low' levels of gene expression. (E-F) Association of 30 genes based on gene expression in single mESCs cultured in serum/LIF or 2i/LIF. 
Numeric values in the color bars represent the distance score calculated as 1 - Pearson-Correlation Coefficient (PCC). Average-linkage, Euclidean- 
distance-based hierarchical clustering was performed the gene expression data. 
doi:10.1371/journal.pcbi.1003777.g003 



Ct-values (Figure S2) of the 30 network gene-products across the 
96 cells show that there are no distinct single cell states but rather a 
level of variability across most cells. We actually expected to 
observe distinct subpopulations that represent few cell states. The 



bimodality in single cells has been observed before for single genes 
and the theory of having few distinct subpopulations is attractive 
from a modeling perspective. Yet the results confirm the 
bimodality of gene expression but not the idea of few distinct 
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Figure 4. Learned Boolean networks. (A-B) Networks with learned Boolean logic transition functions consisting of 30 genes/proteins. Learning 
was achieved using the serum/LIF (A) or 2i/LIF (B) single cell data. Light cyan nodes represent genes; gray squares represent learned regulatory logic 
transition functions. The shadowed inset box exemplifies one such learned transition function upstream of Gata6. The sign -> represents 'NOT', A 
represents 'AND' and v represents 'OR' logic operators. Boolean functions for all genes are available in Supplementary Information. Links from 
upstream parent nodes appearing in more than 90% of equally well-fitted Boolean functions are colored in green (activation) and red (repression). 
Other links are not shown. Novel links that resulted from the learning process are highlighted in dark red. (C) Overlap between the networks learned 
using data from mESCs cultured in serum/LIF or 2i/LIF. Links that are shared in the two conditions are in gray. Non-shared links are solid or dashed 
based on their source (solid for serum/LIF and dashed for 2i/LIF). 
doi:10.1371/journal.pcbi.1003777.g004 



subpopulations within mESCs. Such variability may be associated 
with differentiation priming, perhaps to distinct lineages depend- 
ing on the exact constellation of ON and OFF regulatory nodes in 
single cells. Priming toward different lineages is supported by the 
observation that lineage marker genes of the same lineage cluster 



together due to their high correlation coefficients of expression 
across the 96 single cells (Figure 3E-F). For example, the ectoderm 
marker genes Rail, Ncaml and Otx2 form a cluster and the 
trophectoderm marker genes Tead4, Handl, Eomes and Cdx2 
form another cluster. 
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Learning the regulatory logic functions inter-connecting 
pluripotency components from single cell gene 
expression measurements 

To learn the Boolean transition functions upstream of each 
network node, we applied an exhaustive symbolic search that is 
limited to the AND, OR and NOT logic operators and 
composition of these operators with the restriction of allowing 
each input to feed into only one of gate [28]. Based on the network 
topology extracted from the ESCAPE database and the single cell 
data measured with the microfluidic device, we attempted to fit all 
combinations of the three logic operators for each node given its 
parental inputs. The learning process used the single cell mRNA 
levels measured for each gene together with the original topology 
extracted from the ESCAPE database to derive a truth table for 
each Boolean transition function composed of AND, OR and 
NOT operators (Figures 4, S3, and Table S4). We assumed that 
single cell expression values reflect causal relationships between 
upstream regulators and downstream targets. Allowing only AND, 
OR and NOT gates, without nesting them, limits the search space 
to make the computation sufficiently efficient. However, such 
simplification may miss important biologically relevant Boolean 
functions. For example, exclusive OR (XOR) gates likely exist 
within mammalian cellular gene regulatory networks. In addition, 
threshold Boolean functions, which are Boolean functions that 
require at least several but not specific inputs to be active in order 
to turn on the target output are also missed by our symbolic 
search. The degeneracy of the possible functions listed in Table S4 
suggests that such canalizing functions are likely present in the 
pluripotency circuits and finding them may reduce the degeneracy 
and complexity of the ensemble of dynamical Boolean models we 
obtained. Although there can be many Boolean functions that fit 
the same experimental observations, we attempted to find those 
that are most consistent with the direction and sign of edges of the 
network topology extracted from the ESCAPE database. For some 
input/output relationships many Boolean functions satisfied the 
input/output relationships (Table S4). For each gene, if no 
consistent function emerged with the initial topology, we 
performed a refinement process. The refinement process starts 
by testing all possible transition functions using a single regulator, 
sampling all possible regulators, then pairs and so on, until a 
defined threshold of input/output agreement was reached. For 
each gene, if no consistent function emerged, we performed the 
pruning refinement step by systematically removing all parental 
input links one-by-one and re-sampling all possible transition 
functions. This refinement and pruning procedure was executed 
recursively until a defined threshold of input/ output agreement 
was reached. Specifically, if none of the combinations of plausible 
parents from the input network fulfilled the criteria, where is the 
number of single cell expression vectors, the algorithm exhaus- 
tively introduces single links from all the 15 pluripotency nodes. If 
still no single reassigned of a parent can explain the behavior of the 
downstream target by satisfying the criteria, we attempted all pairs 
of parents. For only four nodes from the serum/LIF learned 
network, and two nodes from the 2i/LIF learned network this 
process was necessary (Table S4). Auto-regulatory interactions are 
not considered in the dynamic Boolean models to keep the model 
dynamics simple. 

After learning, we obtained a large ensemble of Boolean 
network models where, in some cases, many distinct transition 
functions can satisfy the input/ output relationships almost equally 
well. The average number of feedback loops after sampling 1 00 
network models and without considering auto-regulatory loops, 
was 14.39, including an average of 9.38 positive and 5.01 negative 
feedback loops (Figure S3). This number of feedback loops is 



significantly lower than the 39 feedback loops found in the initial 
network prior to learning the transition functions (p-value < 10 65 , 
t-test). The initial topology of the network created from the 
ESCAPE database has many more links than the pruned network, 
after fitting this network to be in concordance with the single cell 
data. The pruning and refinement step eliminates links iteratively, 
from the original topology of the initial network, with the goal of 
finding transition functions that are consistent with the original 
topology as well as with the single cell data. We next enumerated 
all feedback loops present in the sampled Boolean network models 
and ranked positive and negative loops based on their occurrence 
(Figure S3). The most common positive feedback loop in randomly 
selected models is the mutual activation of the Oct4/Pou5fl gene 
by Nanog and the Nanog gene by the transcription factor Oct4/ 
Pou5f 1 ; whereas, the most common negative feedback loop is the 
activation of the gene Tcf3 by Oct4/Pou5f 1 and the inhibition of 
the Oct4/Pou5fl genes by the Tcf3 transcription factor (Figure 
S3). 

While it is established that 2i/LIF can replace serum/LIF to 
maintain the mESC ground pluripotency state [18], detailed 
mechanisms responsible for culture-dependant similarities and 
differences remain elusive. Comparing the networks learned from 
both conditions, we found that the two networks re generally 
consistent with 21 out of 30 nodes having exactly the same 
connectivity. While the original topology of the serum/LIF and 
2i/LIF networks is identical, during the pruning and refinement 
stage, some links can be removed, sign switched or added to obtain 
transition functions that are consistent with the single cell gene 
expression data. Therefore, the connectivity of the learned serum/ 
LIF and 2i/LIF networks are slightly different from each other. 
We observed differences in the predicted regulation of the genes 
Klf4, Tbx3,Jarid2, Fgf5, Gata4, Handl, Otx2, GU2 and Ptpnll 
(Figure 4). In addition, the regulation by the key TF genes Oct4/ 
Pou5fl, Nanog, Sox2, Esrrb and Tcf3 all showed some level of 
difference when comparing the two conditions (Figure 4C). For 
example, Nanog appears to be a positive regulator of Klf4 and a 
negative regulator of Gata4 and Handl only in the 2i/LIF 
condition, while appearing as a negative regulator of Fgf 5 under 
the serum/LIF condition. The family of learned Boolean functions 
confirmed known regulatory interactions and identified new ones. 
For example, Oct4/Pou5fl is positively regulated by Sox2 and 
Nanog, which is known [29-31]. In turn, these links are reinforced 
by positive regulation of Sox2 and Nanog by Oct4/Pou5fl. 
Furthermore, Esrrb was singled out as an activator of Klf4; 
whereas before learning, Nanog was a second potential Klf4 
regulator. Direct activation of Klf4 by Esrrb may explain the 
ability of Esrrb to replace Klf4 for iPSC reprogramming [32]. 
Similarly, the repressive regulation of NrObl (Daxl) by Tcf3 and 
Sall4 was highlighted after refining the network topology with 
single cell data. Based on out-degree centrality (direct targets per 
TF), Oct4/Pou5fl emerges as the master regulator of the entire 
circuit (Table S6). This is not surprising since previous studies have 
shown that Oct4/Pou5fl has the strongest effect on gene 
expression following its depletion in mESCs, and Oct4/Pou5fl 
is the most critical factor for successful iPSC reprogramming 
[25,29,33,34]. 

Four new interactions were suggested from the learned model 
(dark red links in Figure S3). These are the inhibitory interactions 
from Oct4/Pou5fl to Handl, Gata4 and Ptpnll, and a negative 
link from NrObl (Daxl) to Jarid2. Handl is associated with 
trophectodermal commitment. The inclusion of an inhibitory link 
from Oct4/Pou5fl to Handl is consistent with the trophecto- 
dermal phenotype observed after Oct4/Pou5f 1 depletion [35] . In 
addition, it has been reported that Oct4/Pou5fl binds to Handl 
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and Gata4 promoters in hESCs [36]. We confirmed the repressive 
effect of Oct4/Pou5fl on Handl, Gata4 and Ptpnll expression 
by depleting Oct4/Pou5fl using two separate shRNAs followed by 
RT-PCR (Figure S3). In addition, motif analysis revealed potential 
Oct4/Pou5fl binding sites within promoter regions of the three 
genes (Figure S3). While such data suggest direct regulation by 
Oct4/Pou5fl for these genes, it is possible and still consistent with 
the model, that there are additional unidentified intermediates. 
Taken together, our results suggest that the learning framework 
can identify new regulatory relationships that can be experimen- 
tally validated. Although the network model topology does not 
consider protein-protein interactions, the learning process auto- 
matically captures functional relationships between multiple 
pluripotency TFs that are also likely to physically interact. The 
AND logic operator can indirectly suggest physical binding 
interactions. Among the 20 known protein-protein interactions 
that we have collected from prior studies that interconnect the 1 5 
pluripotency regulators, 15 are supported by at least one AND 
operator (Table S7A). This is a high over-representation compared 
to random assignments of logic gates (p<0.001, one-tailed Fisher 
exact test). 

Conversely, we rank the protein -protein pairs connected by an 
AND gate by their co-occurrence frequency in all learned Boolean 
functions, among the top 10 ranked protein-protein pairs, 6 were 
previously reported as interaction partners in large-scale inter- 
actome studies and 4 were reported in low-throughput studies 
(Table S7B). Therefore, the optimized network with the learned 
logic functions is capable of predicting potential protein-protein 
interactions between TFs. Nevertheless, an AND gate does not 
necessitate a physical interaction between TFs. Since the top two 
ranked AND relationships were not supported by direct physical 
interaction studies, we decided to conduct a co-IP experiment to 
test one of these, namely the interaction between Nanog and Sox2. 
Nuclear extract from a mESC line expressing an epitope-tagged 
Nanog was used to directly test this potential interaction (see Text 
SI and Figure S3). The results were negative, suggesting that 
Nanog and Sox2 may not interact directly. This remains consistent 
with the model because an AND gate only requires that two or 
more TFs cooperate to regulate the same target genes by co- 
binding to promoter or enhancer regions but those factors do not 
have to physically interact. 

Single and combinatorial in silico perturbations, model 
dynamics and experimental validations 

The refined 30-node network with the learned regulatory logic- 
gate relationships can be used for dynamical Boolean simulations. 
However, whether such simulations are predictive requires 
experimental validation. To this end, we first performed in silico 
and subsequently, experimental knockdowns of Esrrb, Oct4/ 
Pou5fl and Nanog individually and in all possible double and 
triple combinations (Figure 5A). The learned ensemble of Boolean 
dynamical models was used to make predictions about the network 
response to perturbations. Many Boolean networks consisting of 
30 genes/nodes and a set of Boolean functions were sampled 
randomly from all learned Boolean functions calibrated through 
the learning workflow. Computational simulations were achieved 
by forcing a node(s) into a stable OFF state. Starting with 100 
random initial condition for each sampled network, step-wise 
simulations were performed and the resultant stable values for all 
network nodes (steady-state attractors) were obtained (Figure 5B, 
Text SI). Technically, simulations were performed using discrete 
Boolean dynamics with synchronous updating in 30 steps with 10 
sampled networks, and 100 random initial conditions. In most 
cases, steady states were reached within 3 to 9 simulation steps. In 



the attractor space, on average 83% of the time there was a 
dominant attractor that is achieved from the 100 random initial 
conditions. We then performed the same knockdowns in mESCs 
and measured alkaline phosphatase activities (Figure S4) as well as 
changes in mRNA and protein levels to verify knockdown 
efficiencies (Figure S4). Alkaline phosphatase (AP) is a pluripotency 
stem cell marker, whereas loss of AP activity, as determined by the 
AP assay, is used to access differentiation of mESCs. We also 
measured the expression levels of the 30 network genes/nodes 
using RT-PCR in bulk mESC population, with each experiment 
performed in duplicate (Figure 5C). To compare the binary 
vectors from the in silico perturbations to continuous experimental 
Ct-values representing the mRNA levels in the cell population, we 
calculated post-knockdown gene expression changes relative to the 
unperturbed condition from both simulations and experiments 
(Figure 5B-D). A logistic function (see Text SI) was used to 
compute the consistency between the predicted and measured 
expression changes, including fold-change magnitudes (Figure 5D, 
Table S8). Overall, experimental measurements following single or 
combinatorial knockdowns showed significant agreement with the 
in silico predictions (p-value <10~ 15 compared to a random 
predictor, see Text SI for more details) (Figure 5D). 

All single and combinatorial in silico as well as experimental 
knockdowns repressed some but not all core pluripotency 
components and activated selective differentiation markers con- 
sistent with published experimental results [25,37,38]. In addition, 
in silico simulations showed that knocking down Oct4/Pou5fl has 
the most significant effect, in agreement with previous experimen- 
tal results [25,29,33,34]. However, our dynamical model is 
unsuccessful in predicting the correct activity values for Cdx2, 
Fgf5, Tcf3 and Zfp281 for various knockdown conditions. We 
adopted an alternative strategy to resolve these conflicts by 
utilizing all data sources to train the model: 1) the initial network 
topology; 2) the single cell gene expression measurements; and 3) 
the measurements of the network components after the various 
knockdowns. The re-learning process resulted in re-wired logic for 
Cdx2 and Fgf5 that resolved the conflicts for these two nodes 
(Figure S4G-H). For Tcf3 and Zfp281, the re-learning process did 
not improve the predictions. It is possible that additional upstream 
regulators are required to explain these discrepancies. For 
example, Tcf3 may be differentially regulated by the (3-catenin/ 
Wnt signaling pathway in mESCs. Alternatively, Tcf3 and Zfp281 
could be differentially regulated in mESCs depleted of Oct4/ 
Nanog/Esrrb. The inconsistent behavior of Tcf3 in the model 
versus the empirical observations in our experiments is intriguing. 
The model predicts that knockdowns of Oct4/Nanog/Esrrb 
would down-regulate Tcf3. However, experimental knockdowns 
resulted in up-regulation of this factor. Increased Nanog and Oct4/ 
Pou5fl expression after Tcf3 depletion was previously reported 
[39]. Down-regulation of Tcf3 following depletion of Oct4/ 
Pou5fl or Nanog has also been demonstrated [40]. The latter 
result is consistent with our in-silico predictions while the former is 
not. Training the model with single cell data assumed concordance 
between mRNA levels and TF activities. For Tcf3 and Zfp28 1 this 
assumption may be incorrect. Indeed, in our previous studies after 
depletion of Nanog, we observed significant discordances between 
mRNA and encoded protein levels [41]. In addition to the Nanog, 
Oct4/Pou5fl and Esrrb single and combinatorial knockdowns we 
also depleted Jarid2 and observed significant agreement between 
predicted and experimental results (Figure S4). 

Next, we asked if prediction accuracy is mostly the result of 
calibrating the initial network topology with the single cell data, or 
if it is already largely embedded in the topology extracted from the 
ESCAPE database, with the single cell data contributing only 
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Figure 5. Comparison between computational in silico and experimental knockdowns followed by expression measurements. (A) 

Design of in silico and experimental knockdowns. (B) Simulation results of computational knockdowns. Red represents up-regulation, blue represents 
down-regulation and white represents no change with regards to the unperturbed condition. (C) Results of experimental knockdowns followed by 
mRNA measurements of all network nodes, where each experiment was repeated twice. Colorbar illustrates log-scaled fold-change. Red represents 
up-regulation and blue represents down-regulation with regard to the unperturbed condition. (D) Comparison of computational and experimental 
knockdowns (see Text SI). Colors correspond to discordance score defined in the objective function in the supplementary material. Concordant 
results are colored in yellow while discordant results are colored in blue with gradients representing the degree of (dis)agreeability. 
doi:1 0.1 371 /journal.pcbi.1 003777.g005 



minor tuning. To address this question, we randomly and 
sequentially flipped binary single cell gene expression values or, 
similarly reassigned network links from ESCAPE (see Text SI). 
We demonstrate this methodology using a toy network with 
simulated data (Figure S7). Given any directed network and single 
cell gene expression data (Figure S7), the regulatory logic can be 



learned. Then, when in-silico knockdowns are performed, 
computational knockdowns can be compared with experimental 
knockdowns. Each entry in the comparison heatmap was 
calculated as 1 - discordance-score. Note that the score didn't 
reach 100 despite the ideal example, due to the logistic function 
used. Both shuffling single cell values, or links from the original 
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topology of the network, reduced the predictive power of the 
model to approximately equal extents, demonstrating that both 
sources of data contribute to prediction accuracy (Figure S4). 

Because the predictions obtained from the dynamical model are 
generally reliable, we simulated all possible single or combinatorial 
knockdowns to predict mESC lineage commitment outcomes 
(Figure 6, Text SI). The 15 lineage-specific markers in the 
network may be limited and biased because lineage specification 
involves more genes. Therefore, we linked the simulated 
knockdowns to a larger set of 40 lineage marker genes (Text SI). 
We compared predictions that are based on the resultant state of 
the network core components after simulated knockdowns, with 
predictions that are based solely on the additive effect of the direct 
targets of individual and combinations of TFs without simulations. 
Interestingly, predictions made through the Boolean modeling 
approach were more consistent with lineage commitment knowl- 
edge compared with the direct TF target-based predictions; for 
example, forcing Oct4/Pou5fl into a stable OFF state in our 
dynamical model results in more pronounced predicted trophec- 
todermal differentiation when compared with predictions that are 
based solely on Oct4/Pou5fl target genes. Trophectoderm 
induction after Oct4/Pou5fl depletion is well-established [42]. 
We also examined lineage marker gene levels after depleting Esrrb 
or Jarid2. For Esrrb, simulation-based predictions were more 
consistent with experimental data showing greater effects on 
neuroectodermal than trophectodermal or mesodermal differen- 
tiation. In the case of Jarid2, both simulation-based and direct 
targets lineage predictions were consistent with our experimental 
observations which point to primitive endoderm differentiation 
(Figures 6 and S5). Overall, the Boolean model appears to resolve 
indirect effects of TFs on lineage commitment. The model may be 
most useful for prioritizing combinations of knockdowns not easily 
tested in high-throughput. 

Consensus and unique interactions across four learned 
models and two culture conditions 

Finally, we constructed two additional dynamical models 
learned from two recently published single cell gene expression 
datasets collected from CCE and Nanog removed (Nanog_R) 
mESCs using the same Fluidigm platform in the same laboratory 
[43] . To compare the four dynamical models constructed from the 
various types of mESCs we counted unique and shared regulatory 
interactions (Figure S6). Interestingly, we identified 41 unique 
interactions present only in one model under one condition, and 
13 consensus interactions shared by all four models and 
conditions. In addition, among all the learned interactions from 
the four models, there are 42 interactions that appear in 3 out of 4 
models, and 62 interactions that are found in 2 out of 4 models. 
Hence, on average each model contains —10 unique interactions 
and —40 interactions shared by at least one another model. One of 
the consensus interactions is Nanog regulation of Zfp281, 
reinforcing the importance of this interaction. Likewise, Oct4/ 
Pou5fl consistently positively regulates Zfp42/Rexl and Sall4. 
Intriguingly, the positive feedback loop between Oct4/Pou5fl and 
Sox2 exists in all models in serum/LIF but not in 2i/LIF, 
suggestive of culture-specific regulatory interactions. Other differ- 
ences can stem from cell-line differences, differences in the times 
the experiments were conducted, as well as instrument noise. 
Overall we consider the agreement among all 4 models to be 
relatively robust. 

Conclusions and summary 

For this study we first developed a directed and signed network 
consisting of 15 pluripotency and self-renewal regulators 



connected to 15 lineage marker genes. We then learned the 
underlying regulatory logic among network components utilizing 
single cell gene expression data. Gene expression in single cells 
uncovered some differences in mESCs cultured under serum/LIF 
versus 2i/LIF conditions. Characterizing such culture-dependent 
differences is important for understanding the enhanced iPSC 
reprogramming efficiency in the 2i/LIF condition [44]. Expres- 
sion in single cells was found to be mostly bimodal and this fits well 
with a Boolean modeling framework. The utility of such Boolean 
modeling approach is the ability to perform predictions of network 
behavior after in silico perturbations, and the ability to control and 
constrain the free parameter space. The consistency such in silico 
perturbations were compared to experimental combinatorial 
shRNA perturbations. The good agreement between the model 
and the experimental validation suggest that the model does 
capture some of the real dynamics of the pluripotency and self- 
renewal circuit. Nevertheless, many challenges remain. Our model 
is binary, with genes and their products merged into single nodes 
assuming a direct correlation between TF activity and mRNA 
expression. We observed such correlations for Nanog, Oct4/ 
Pou5fl and Esrrb (Figure S4), but for other TFs this may not be 
the case. Furthermore, while there is ample evidence describing 
how pluripotency TFs regulate lineage-specific genes, little is 
known about lineage regulator-mediated suppression of the 
pluripotency circuit. In addition, gene expression is controlled by 
protein complexes and epigenetic modifications not explicitly 
incorporated into the Boolean model. Integration of TFs, histone 
modifications and DNA methylation may result in a more 
complex model but also such model will be more accurate and 
revealing [45]. Nevertheless, rapid progress in the field indicates 
that we will gradually be able to obtain more refined and dynamic 
view of pluripotency, self-renewal, lineage-specific commitment 
and differentiation, as well as better understand the process of 
iPSC reprogramming. Such views will enable the realization of 
pluripotent cell-based applications in regenerative medicine. 

Methods 

Cell culture and single cell RT-PCR analysis 

The Esrrb_R rescue cell line was derived from AINV-15 ESCs 
and cultured as previously described [25] in ESC media 
containing doxycycline (1 jig ml-1 Sigma). CCE mESCs are one 
of the first established stem cell lines [46] . They are derived from 
the 129/Sv mouse strain. All cells used in these experiments were 
under passage 80. Serum-free ESC cultures were performed as 
previously described [47,48]. Briefly, cells were maintained 
without feeders in serum-free N2B27 media prepared as described 
[47] and supplemented with LIF (Chemicon, Millipore) and 2i 
inhibitors [18]. The two inhibitors (Stemgent) block GSK3(3, 
(CHIR99021; 3 uM) and MEK1/2 (PD0325901; 1 uM). All 
cultures were maintained at 37°C with 5% CO2. Inventoried 
TaqMan assays (20 x, Applied Biosystems) were pooled to a final 
concentration of 0.2 x for each of the 96 assays. Single Esrrb_R 
cells expressing both GFP and SSEA-1 were FACS-sorted directly 
into 10 uL RT-PreAmp Master Mix (5.0 jiL CellsDirect 2x 
Reaction Mix, 2.5 jiL 0.2 x assay pool, 0.2 JiL SuperScript III 
RT/Platinum Taq Mix from the (CellsDirect One-Step qRT PCR 
Kits, Invitrogen) and 1.3 jiL TE buffer. Cell lysis and gene-specific 
reverse transcription were performed at 50°C for 20 min. Reverse 
transcriptase was heat-inactivated for 2 min at 95°C. Subsequent- 
ly, single cell cDNA was pre-amplified using a multiplexed, target- 
specific amplification protocol (denaturation at 95°C for 15 sec, 
and annealing and amplification at 60°C for 4 min for a total of 18 
cycles). Pre-amplified products were diluted 5 -fold prior to 
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Figure 6. Computational in silico knockdowns of all possible single and double perturbations linked to predicted lineage 
differentiation outcomes. (A, C) Predicted lineage specification based on dynamical model simulations. (B, D) Lineage predictions based on direct 
effects of knockdowns. Predictions are based on differentially expressed genes from LOF studies and promoter binding of transcription factors to 
differentially expressed genes based on mESC ChlP-chip/seq studies from the ESCAPE database. 
doi:1 0.1 371 /journal.pcbi.1 003777.g006 



amplification using a Universal PCR Master Mix and inventoried 
TaqMan gene expression assays (Applied Biosystems) in 96x96 
Dynamic Arrays on a BioMark System (Fluidigm). Amplification 
included a 10 min, 95 °G hot-start followed by 40 cycles of a two- 
step program consisting of 15 sec at 95°C and 60 sec at 60°C. Ct- 
values were calculated using BioMark Real-Time PCR Analysis 
Software v2.0 (Fluidigm). Values greater than 35 were considered 
non-detectable and recorded as 35. 

Short hairpin (sh)RNA design and plasmid construction 

Gene-specific 19nt shRNAs were designed based on a 
previously described algorithm using an in-house Perl script 
[49]. All shRNA sequences were BLASTed to ensure specificity. 
Synthesized oligomers were annealed and ligated into the 
pSuper.puro vector (Oligoengine). To make the Oct4/Nanog/ 
Esrrb combinatorial shRNA constructs, Clal-Xhol sites were used 



to insert HI -shRNA cassettes digested with BstBI-XhoI. The 
shRNA encoding sequences are: Oct4- GAAGGATGTGGTTC - 
GAGTA (shRNA_#l) and GC GAACTAGC ATTGAGAAC 
(shRNA_#2), Nanog- GAAGTATTCTTGCTTAGAA, Esrrb- 
GATTC GATGTAC ATTGAGA and Jarid2 - TCACTGTCC- 
TGGGAAATAA. 

Cell culture and transfection 

Mouse CCE ESCs were cultured feeder-free on 0.1% gelatin- 
coated plates in ESC media (Dulbecco's modified Eagle's medium 
(DMEM; Hi-Glucose), 15% fetal bovine serum, non-essential 
amino acids, L-glutamine, (3-mercaptoethanol, penicillin/strepto- 
mycin, sodium pyruvate and LIF (Millipore). Serum was 
purchased from HyClone. This serum is embryonic stem cell 
qualified and therefore does not require heat inactivation. The 
specific lot of serum was rigorously tested to ensure robust 
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self-renewal with little spontaneous differentiation as assessed by 
mESC morphologies and alkaline phosphatase staining. All cell 
cultures were maintained at 37°C with 5% C0 2 . Gene-specific or 
scrambled shRNA constructs, the GFP-shRNA construct and 
empty vector (all 3 ug) were transfected using Lipofectamine 2000 
(Invitrogen) according to the manufacturer's instructions. Trans- 
fected cells were selected for 48 hrs. in puromycin (1.5 ug/ml). 
Mock transfections resulted in no surviving cells after selection. 

Real-time quantitative PCR 

Total RNA was Trizol-extracted (Invitrogen), column-purified 
with RNeasy kits (Qiagen), and reverse transcribed using the High 
Capacity reverse transcription kit (Applied Biosystems). All 
quantitative PCR analyses were performed using the Fast SYBR 
Green Master Mix (Applied Biosystems) following the manufac- 
turer's protocol on the LightCycler480 Real-Time PCR System 
(Roche). Each PCR reaction generated a specific amplicon, as 
demonstrated by melting-temperature profiles (Dissociation Curve 
analysis). No PCR products were observed in the absence of 
template. Data were normalized to Gapdh and represented relative 
to empty- vector transfected controls. Primer sequences are 
available in Table S9. 

Western blot analysis 

Cells were scraped/trypsinized, washed in PBS and incubated 
for 20 min in cold RIPA buffer without SDS. Protein concentra- 
tions were determined using Bradford Dye (Bio-Rad). Total 
proteins (—10 ug) were separated on SDS-PAGE gels and 
transferred to PVDF membranes (Millipore). Membranes were 
probed with specific primary antibodies followed by HRP- 
conjugated secondary antibodies and developed with ECL 
(Amersham). Primary antibodies were: Oct4 (sc9081, Santa Cruz), 
Nanog (A300-398A, Bethyl), Actin (scl615, Santa Cruz), Sox2 
(scl7320, Santa Cruz) and Esrrb (300-748, Novus Biologicals). 
Amido-black was used to detect core histones. Quantification of 
protein bands was performed using Adobe Photoshop. Relative 
protein level differences were calculated by normalization to actin 
levels and shown relative to empty- vector transfected sample. 

Alkaline phosphatase staining assay 

An alkaline phosphatase detection kit (Stemgent) was used to 
measure activity according to the manufacturer's instructions. 

Signed directed network assembly 

From the ESCAPE database, a network containing 15 core 
pluripotency and 15 lineage-specific components was extracted. 
Arrows were established if there was evidence for binding of a 
specific TF to a target gene from mESC ChlP-chip/ seq studies as 
well as a change in target gene expression level after loss-of- 
function (LOF) or gain-of-function (GOF) of the same TF in 
mESCs. We applied a majority- voting function giving more weight 
to LOF than to GOF evidence and binarized the output as either 
activation (+1) or inhibition (— 1) using the sign function: 

a u = s § n (j2 El °f^ + 0,6 x Yl E s°f(i> J^j xIi J ( 1 ) 



Where Ei 0 f(i,j) is 1 for activation or —1 for inhibition 
according to the k th LOF study where gene i is depleted by 
RNAi or deleted by homologous recombination. E go f(i,j) is 1 for 
activation or otherwise — 1 for inhibition according to the p th GOF 



study where gene i is over-expressed. Iy C with value 1 if there 
exists at least one protein-target gene promoter binding interaction 
connecting transcription factor i to target gene j from ChlP-chip/ 
seq studies and 0 otherwise. A few links (8 out of 450 potential 
interactions) were manually refined in cases with contradictory 
evidence using information from small-scale functional studies 
(Table S2). 

K-means clustering 

Experimental data from the Fluidigm platform measure 
transcript abundance of up to 96 genes analyzed in 96 single 
cells. Ct-values were normalized to the housekeeping gene Gapdh 
levels in each cell and converted to binary values (1 for high 
expression and 0 for low expression) using K-means clustering 
with K = 2 (MATLAB, Bioinformatics Toolbox). Histogram 
curves for normalized Ct-values were smoothened using the 
Kernel smoothing algorithm (MATLAB, Statistics Toolbox). 
Hierarchical clustering for binarized and continuous expression 
levels of the 30 network genes in 96 individual cells were 
hierarchically clustered using the average-linkage clustering 
algorithm (MATLAB, Bioinformatics Toolbox). 

Dynamical simulations and comparison between in silico 
and experimental knockdowns 

The calibrated network inferred from the Boolean-function- 
learning-process is simulated using discrete Boolean dynamics with 
synchronous updating and the learned Boolean functions. An 
expression pattern \jj is defined as a state vector {0,1}" where i// t+ i 
is determined by \j/ t with the underlying Boolean functions as 
follows: i// t+l (vi)=fi(il/ t (vii), . . . ,i// t (vik)). One Boolean network 
consisting of 30 genes and a set of Boolean functions is sampled 
randomly from all learned Boolean functions calibrated through 
the learning workflow. For each in silico simulation setting, 10 
networks were sampled and the gene status (0 or 1) is recorded in 
Btj for each gene i in condition j as follows: for each sampled 
network p, 100 random initial conditions are evolved for 30 steps 
in synchronous mode. We set the step of 30 since all networks can 
reach steady states within a step of 30 in our case. In silico 
knockdowns are achieved by forcing gene(s) to the 'OFF' state all 
the time while recording the final state of the Boolean network. 
Each final state, as denoted by an attractor a t is recorded with 
weight w t based on the size of its basin (defined as the set of initial 
states that lead to an attractor). Since a network can reach multiple 
steady states with certain probabilities, we weighted each stable 
state a t by the probability of being in that particular state w t . Thus 
the final binary state of each gene i in each condition j for a 

{lif ^a it w t >Q.5 
0 if ^2 a- w <0 5 ' 

where au denotes the state of gene i in attractor t and w t is the 
weight of attractor t. Then By for 10 sampled networks is 
calculated as follows: 



Btj={ 



0if i(E^H 0 - 5 



lif io(E^)> 0 - 5 



10 



The formula for calculating the discordance score dy for gene i 
in condition j between simulation and experiment is as follows: 



PLOS Computational Biology | www.ploscompbiol.org 



12 



August 2014 | Volume 10 | Issue 8 | e1003777 



Regulatory Network of Embryonic Stem Cells 



1 +e~ 



{-abs(AACTij), if ABy x AACT^ >0 
abs(AACTij) - 1 , if ABy x AACTy = 0 
abs(AACTij), if A^ x AAC^- <0 



Where afe(»)is absolute value function; ABy reflects the 
expression change for gene i in condition j relative to unperturbed 
condition under in silico simulations, with values from { — 1,0,1} 
representing 'down-regulation', 'no-change' and c up-regulation', 
respectively; A ACT// reflects log 2 fold-change of gene expression 
in knockdown mESCs relative to empty-vector transfected controls 
measured by RT-PCR, with negative values representing 'up- 
regulation' and positive values representing 'down-regulation'. As 
the logistic function is monotonically increasing, larger value of t 
would result in larger discordance score. 

Quantifying agreement between experimental results 
and simulations results of networks learned from 
randomized single cell data/network topology 

Single cell expression data are gradually randomized by flipping 
data points with a certain percentage a. The topology of the 
network is randomized by rewiring a percent links represented as 
entries in the adjacency matrix underlying the 30 node network 
extracted from the ESCAPE database. An objective function L is 
defined to quantify the error between in silico knockdown 
simulations and the experimental results. L(a) = J^/ 2/ wnere 
dy is the same discordance score defined above. Relative accuracy 

is defined as ( 1 



, x 100%. For each 

# of total predictions 

a, permutation level and learning process is repeated 10 times 
to obtain means and standard deviations. 

Statistical significance of model prediction accuracy 

Mean discordance scores across all genes and perturbation 
conditions (30x7 = 210) from the model were compared to 
prediction results with a random predictor. This is a simple 
predictor with outputs randomly chosen from — 1 , 0 or +1 
representing 'down-regulation', 'no change' or 'up-regulation', 
respectively. A total of 500 random M 30x7 matrices were 
generated using the random predictor. Rows represent genes 
and columns represent perturbation conditions. Individual mean 
discordance scores were calculated for each of the 500 random 
matrices. A one-sample t-test was performed to test the null 
hypothesis that the random sample mean is equal to mean 
discordance scores from the model. 
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